repeated experimental result
Supplementary Material for Parameter-Efficient Masking Networks
For all the backbones used in our experiments, we follow their default training settings. We set the maximum learning rate as 0.0001. We set batch size as 256 and the number of total epochs as 200. We use different configurations for hidden dimension (256/512) and depth (6/8) in our experiments section. The weight decay and momentum are set as 0.0005 and 0.9.